Summary
- Interested in finding Long Period Variables (LPVs).
- Built a classifier to sort candidates into:
- LPVs, non-LPVs, and non-variables.
- Used upsampling and downsampling to create a balanced training set for use with a gradient-boosted decision tree classifier.
- Identified 159,696 LPVs from the PGIR survey; 73,346 newly identified.
Data
- Palomar Gattini-IR (PGIR) survey
- September 2018 – July 2021 (\(\approx\) 1400d)
- Near-Infrared J band
Method
- Build an initial bonafide training set (\(n = 1344\))
- Extend training set by taking a subsample of the full survey and classifying these.
- Train the gradient-boosted decision tree on the extended training dataset.
- Apply classifier to subset of PGIR survey (\(N = 35\textrm{M}\)).
- Compare the labelled result with subset of GAIA DR3 database of LPVs (\(N = 150,195\)).
Features
Class Imbalance
- bonafide training set:
- 1265 LPVs vs. 79 non-LPVs
- ADASYN to upsample minority class, allKNN to downsample majority class
- extended training set:
- 2335 LPVs
- 444 Type-II LPVs
- 166 non-LPVs
- 1332 non-variables
Training Set Confusion Matrix
Variable Importance
Comparison with GAIA
Comments